在图像理解项目中越来越多的情况下,场景图一代在电脑视觉研究中获得了很多关注,如视觉问题应答,图像标题,自动驾驶汽车,人群行为分析,活动识别等等。场景图,图像的视觉图形结构,非常有助于简化图像理解任务。在这项工作中,我们介绍了一个称为几何上下文的后处理算法,以了解视觉场景更好的几何上。我们使用该后处理算法在对象对与先前模型之间添加和改进几何关系。我们通过计算对象对之间的方向和距离来利用此上下文。我们使用知识嵌入式路由网络(KERN)作为我们的基准模型,将工作与我们的算法扩展,并显示最近最先进的算法上的可比结果。
translated by 谷歌翻译
A step-search sequential quadratic programming method is proposed for solving nonlinear equality constrained stochastic optimization problems. It is assumed that constraint function values and derivatives are available, but only stochastic approximations of the objective function and its associated derivatives can be computed via inexact probabilistic zeroth- and first-order oracles. Under reasonable assumptions, a high-probability bound on the iteration complexity of the algorithm to approximate first-order stationarity is derived. Numerical results on standard nonlinear optimization test problems illustrate the advantages and limitations of our proposed method.
translated by 谷歌翻译
Optimal Power Flow (OPF) is a very traditional research area within the power systems field that seeks for the optimal operation point of electric power plants, and which needs to be solved every few minutes in real-world scenarios. However, due to the nonconvexities that arise in power generation systems, there is not yet a fast, robust solution technique for the full Alternating Current Optimal Power Flow (ACOPF). In the last decades, power grids have evolved into a typical dynamic, non-linear and large-scale control system, known as the power system, so searching for better and faster ACOPF solutions is becoming crucial. Appearance of Graph Neural Networks (GNN) has allowed the natural use of Machine Learning (ML) algorithms on graph data, such as power networks. On the other hand, Deep Reinforcement Learning (DRL) is known for its powerful capability to solve complex decision-making problems. Although solutions that use these two methods separately are beginning to appear in the literature, none has yet combined the advantages of both. We propose a novel architecture based on the Proximal Policy Optimization algorithm with Graph Neural Networks to solve the Optimal Power Flow. The objective is to design an architecture that learns how to solve the optimization problem and that is at the same time able to generalize to unseen scenarios. We compare our solution with the DCOPF in terms of cost after having trained our DRL agent on IEEE 30 bus system and then computing the OPF on that base network with topology changes
translated by 谷歌翻译
In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with traditional methods in rule learning to provide efficient and scalable algorithms for the classification of vast data sets, while remaining explainable. Apart from evaluating our approach on the common large scale data sets MNIST, Fashion-MNIST and IMDB, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with Allianz Private Krankenversicherungs-Aktiengesellschaft which is an insurance company offering diverse services in Germany.
translated by 谷歌翻译
Network models are an essential block of modern networks. For example, they are widely used in network planning and optimization. However, as networks increase in scale and complexity, some models present limitations, such as the assumption of markovian traffic in queuing theory models, or the high computational cost of network simulators. Recent advances in machine learning, such as Graph Neural Networks (GNN), are enabling a new generation of network models that are data-driven and can learn complex non-linear behaviors. In this paper, we present RouteNet-Fermi, a custom GNN model that shares the same goals as queuing theory, while being considerably more accurate in the presence of realistic traffic models. The proposed model predicts accurately the delay, jitter, and loss in networks. We have tested RouteNet-Fermi in networks of increasing size (up to 300 nodes), including samples with mixed traffic profiles -- e.g., with complex non-markovian models -- and arbitrary routing and queue scheduling configurations. Our experimental results show that RouteNet-Fermi achieves similar accuracy as computationally-expensive packet-level simulators and it is able to accurately scale to large networks. For example, the model produces delay estimates with a mean relative error of 6.24% when applied to a test dataset with 1,000 samples, including network topologies one order of magnitude larger than those seen during training.
translated by 谷歌翻译
Our earlier research built a virtual shake robot in simulation to study the dynamics of precariously balanced rocks (PBR), which are negative indicators of earthquakes in nature. The simulation studies need validation through physical experiments. For this purpose, we developed Shakebot, a low-cost (under $2,000), open-source shake table to validate simulations of PBR dynamics and facilitate other ground motion experiments. The Shakebot is a custom one-dimensional prismatic robotic system with perception and motion software developed using the Robot Operating System (ROS). We adapted affordable and high-accuracy components from 3D printers, particularly a closed-loop stepper motor for actuation and a toothed belt for transmission. The stepper motor enables the bed to reach a maximum horizontal acceleration of 11.8 m/s^2 (1.2 g), and velocity of 0.5 m/s, when loaded with a 2 kg scale-model PBR. The perception system of the Shakebot consists of an accelerometer and a high frame-rate camera. By fusing camera-based displacements with acceleration measurements, the Shakebot is able to carry out accurate bed velocity estimation. The ROS-based perception and motion software simplifies the transition of code from our previous virtual shake robot to the physical Shakebot. The reuse of the control programs ensures that the implemented ground motions are consistent for both the simulation and physical experiments, which is critical to validate our simulation experiments.
translated by 谷歌翻译
Transformers have been essential to pretraining success in NLP. Other architectures have been used, but require attention layers to match benchmark accuracy. This work explores pretraining without attention. We test recently developed routing layers based on state-space models (SSM) and model architectures based on multiplicative gating. Used together these modeling choices have a large impact on pretraining accuracy. Empirically the proposed Bidirectional Gated SSM (BiGS) replicates BERT pretraining results without attention and can be extended to long-form pretraining of 4096 tokens without approximation.
translated by 谷歌翻译
Machine learning models have been found to learn shortcuts -- unintended decision rules that are unable to generalize -- undermining models' reliability. Previous works address this problem under the tenuous assumption that only a single shortcut exists in the training data. Real-world images are rife with multiple visual cues from background to texture. Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i.e., where mitigating one shortcut amplifies reliance on others. To address this shortcoming, we propose two benchmarks: 1) UrbanCars, a dataset with precisely controlled spurious cues, and 2) ImageNet-W, an evaluation set based on ImageNet for watermark, a shortcut we discovered affects nearly every modern vision model. Along with texture and background, ImageNet-W allows us to study multiple shortcuts emerging from training on natural images. We find computer vision models, including large foundation models -- regardless of training set, architecture, and supervision -- struggle when multiple shortcuts are present. Even methods explicitly designed to combat shortcuts struggle in a Whac-A-Mole dilemma. To tackle this challenge, we propose Last Layer Ensemble, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior. Our results surface multi-shortcut mitigation as an overlooked challenge critical to advancing the reliability of vision systems. The datasets and code are released: https://github.com/facebookresearch/Whac-A-Mole.git.
translated by 谷歌翻译
Efficient ObjectGoal navigation (ObjectNav) in novel environments requires an understanding of the spatial and semantic regularities in environment layouts. In this work, we present a straightforward method for learning these regularities by predicting the locations of unobserved objects from incomplete semantic maps. Our method differs from previous prediction-based navigation methods, such as frontier potential prediction or egocentric map completion, by directly predicting unseen targets while leveraging the global context from all previously explored areas. Our prediction model is lightweight and can be trained in a supervised manner using a relatively small amount of passively collected data. Once trained, the model can be incorporated into a modular pipeline for ObjectNav without the need for any reinforcement learning. We validate the effectiveness of our method on the HM3D and MP3D ObjectNav datasets. We find that it achieves the state-of-the-art on both datasets, despite not using any additional data for training.
translated by 谷歌翻译
We present a method that accelerates reconstruction of 3D scenes and objects, aiming to enable instant reconstruction on edge devices such as mobile phones and AR/VR headsets. While recent works have accelerated scene reconstruction training to minute/second-level on high-end GPUs, there is still a large gap to the goal of instant training on edge devices which is yet highly desired in many emerging applications such as immersive AR/VR. To this end, this work aims to further accelerate training by leveraging geometry priors of the target scene. Our method proposes strategies to alleviate the noise of the imperfect geometry priors to accelerate the training speed on top of the highly optimized Instant-NGP. On the NeRF Synthetic dataset, our work uses half of the training iterations to reach an average test PSNR of >30.
translated by 谷歌翻译